Developments in Digital Preservation at the University of Illinois: The Hub and Spoke Architecture for Supporting Repository Interoperability and Emerging Preservation Standards

نویسندگان

  • Thomas G. Habing
  • Janet Eke
  • Matthew A. Cordial
  • William Ingram
  • Robert Manaster
چکیده

Funded by the National Digital Information Infrastructure and Preservation Program (NDIIPP), the ECHO DEPository Project supports the digital preservation efforts of the Library of Congress by contributing research and software to help society GET, SAVE, and KEEP its digital cultural heritage. Project activities include building Web archiving tools, evaluating existing repository software, developing architectures to enhance existing repositories’ interoperability and preservation features, and modeling next-generation repositories for supporting long-term preservation. This article describes the development of the Hub and Spoke (HandS) Tool Suite, built to help curators of digital objects manage content in multiple repository systems while preserving valuable preservation metadata. Implementing METS and PREMIS, HandS provides a standards-based method for packaging content that allows digital objects to be moved between repositories more easily while supporting the collection of technical and provenance information crucial for long-term preservation. Related project work investigating the more fundamental semantic issues underlying the preservation of the meaning of digital objects over time is profiled separately in this issue (Dubin et al., 2009). The Hub and Spoke Interoperability Architecture HandS is a suite of tools built to support moving content between repositories while generating and maintaining PREMIS-based technical and preservation metadata. It emerged out of project activities to evaluate open-source repositories in which we found typically low out-of-the-box Developments in Digital Preservation at the University of Illinois: The Hub and Spoke Architecture for Supporting Repository Interoperability and Emerging Preservation Standards Thomas Habing, Janet Eke, Matthew A. Cordial, William Ingram, and Robert Manaster LIBRARY TRENDS, Vol. 57, No. 3, Winter 2009 (“The Library of Congress National Digital Information Infrastructure and Preservation Program,” edited by Patricia Cruse and Beth Sandore), pp. 556–579 (c) 2009 The Board of Trustees, University of Illinois 557 habing/hub and spoke architecture support for interoperability and low support for emerging preservation standards. The next section describes the impetus and rationale behind the HandS development in more detail. Hub and Spoke Background The development of the Hub and Spoke (HandS) Architecture was a natural outcome of activities required to develop a test bed for evaluating multiple repository systems. During the development of our test bed we found ourselves developing a number of different though similar customized scripts and programs for exporting digital packages from one repository system and importing those digital packages into another repository system. The repository systems themselves had very little in common that would facilitate this task. They typically supported different descriptive metadata formats, had no support for provenance metadata, offered little or no support for technical metadata, and employed different means of identifying the files constituting a package. The development of an inhouse tool to facilitate data interoperability between multiple repositories without the need to develop customized mechanisms for each repository combination therefore soon emerged as a key task to support our repository evaluation activities. At the same time, we were also coming to a more structured understanding of emerging digital preservation standards, specifically early drafts of An Audit Checklist for the Certification of Trusted Digital Repositories (RLG, 2005; Kaczmarek et al., 2006; Kaczmarek, Habing, and Eke, 2006) and the PREMIS Data Dictionary for Preservation Metadata (PREMIS Working Group, 2005). We began to see that a formally developed interoperability architecture designed with a focus on providing additional support for retention of provenance and technical metadata could be a valuable and practical project deliverable, and one with immediate application in our own libraries and in other institutions that commonly implement multiple repository systems to manage and preserve digital collections. The Need for Interoperability and Preservation Support Institutions Commonly Rely on Multiple Repositories There are currently many different digital repositories in widespread use, including DSpace, Greenstone, Fedora, EPrints, and CONTENTdm, along with digital archive services like those from OCLC and CDL. There are also many different sources of input into these systems, such as from Web crawlers like Heritrix or packaged content from OCLC’s Web Archives Workbench, as well as numerous digitization and scanning services. It is also not uncommon for several of these systems to be in use within a single institution. If curators wish to share data internally, or with other institutions or consortia, it is very likely that multiple repository systems 558 library trends/winter 2009 will come into play. Repository interoperability issues also emerge as institutions update or replace their repository systems, and must migrate content from an existing repository system to its replacement. Out-of-the-box Repository Interoperability Is Low Our repository evaluation experiments and our experiences with repositories in production at our own institutions show that the native ability for repositories to interoperate is typically very basic. Almost none of the systems we tested were able to operate with one another beyond a rudimentary level, usually restricted to the OAI Protocol for Metadata Harvesting (OAI-PMH) for Dublin Core. If any OAIS concepts are implemented (and few are), such as the use of submission or dissemination information packages (SIPs and DIPs), these implementations vary greatly (Consultative Committee for Space Data Standards, 2002). In an ideal OAIS-compliant world, a DIP from one repository should be a SIP to another. However, in reality, a dissemination package produced by DSpace cannot be used for submission into EPrints. Because of these inconsistencies, achieving any real interoperability between repository systems usually entails some level of custom software development. Further, anytime a new repository is added to the mix, new software will need to be developed in order to accommodate the added repository. Support for Emerging Preservation Standards Is Low Few of the current repositories have any explicit support for preservation, such as for collecting preservation metadata as articulated by PREMIS, or activities to support preservation such as format migrations or checksum validations as outlined in the Trusted Digital Repository Checklist. For an institution that deploys several repository systems, a task as simple as performing consistent backups to off-line storage can become complicated by the fact that the systems store their underlying data differently. There may be data stored in relational databases, XML databases, RDF triple stores, and various file systems—all of which must be backed up, and may require different backup techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Concept of Temporal Interoperability

This paper explores a new way of thinking about digital preservation and introduces a new requirement for interoperability that I refer to as temporal interoperability. The concept of temporal interoperability concerns the interoperability of systems and the access to heterogeneous collections over time. The paper discusses recent developments in digital preservation that begin to approach pres...

متن کامل

The Planets Interoperability Framework An Infrastructure for Digital Preservation Actions

We report on the implementation of a software infrastructure for preservation actions, carried out in the context of the European Integrated Project Planets – the Planets Interoperability Framework (IF). The design of the framework was driven by the requirements of logical preservation in the domain of libraries and archives, which include durable and scalable infrastructures for the characteri...

متن کامل

The Metadata Coalface for Digital Repositories

In this paper we examine a range of metadata-related issues facing the developers and maintainers of digital repositories in Australia. We discuss metadata developments in the areas of digital preservation, repository interoperability, and collection-level discovery services in the context of a range of innovative repository projects designed to improve metadata creation, management and sharing...

متن کامل

Identifying Barriers To File Rendering In Bit-level Preservation Repositories

This paper seeks to advance digital preservation theory and practice by presenting an evidence-based model for identifying barriers to digital content rendering within a bit-level preservation repository. It details the results of an experiment at the University of Illinois at Urbana-Champaign library, where the authors procured a random sample of files from their institution’s digital preserva...

متن کامل

ترسیم نقشه دانش حوزه کتابخانه‌های دیجیتالی در ایران: تحلیل هم‌رخدادی واژگان

This study aimed to knowledge mapping of Digital Libraries (DLs) field in Iran. This is a scientometrics study. In this regard, Social Network and co-word analysis methods were used. 554 research resources such as books, national and international journal papers, conferences articles, and MA and Ph.D. Theses in Iran up to 2013 were studied. Researcher made checklist was used to collext data. Al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Library Trends

دوره 57  شماره 

صفحات  -

تاریخ انتشار 2009